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Abstract 

We consider analysis of dependent functional data that are correlated because of 
a longitudinal-based design: each subject is observed at repeated time visits and for 
each visit we record a functional variable. We propose a novel parsimonious model¬ 
ing framework for the repeatedly observed functional variables that allows to extract 
low dimensional features. The proposed methodology accounts for the longitudinal 
design, is designed for the study of the dynamic behavior of the underlying process, 
and is computationally fast. Theoretical properties of this framework are studied 
and numerical investigation confirms excellent behavior in finite samples. The pro¬ 
posed method is motivated by and applied to a diffusion tensor imaging study of 


multiple sclerosis. Using Shiny (Chang et al., 20151 we implement interactive plots 


to help visualize longitudinal functional data as well as the various components and 
prediction obtained using the proposed method. 


Keywords'. Dependent functional data, Diffusion Tensor Imaging, Functional 
principal component analysis, Longitudinal design, Multiple Sclerosis. 


1 Introduction 


Longitudinal functional data refer to data consisting of functions (such as prohles or 
images) observed repeatedly for each subject at multiple instances, often visit times. 
Examples of such data include the Baltimore Longitudinal Study of Aging (BLSA), where 
daily physical activity count prohles are observed for each subject at several consecutive 


days (Xiao et ah, 2013, Goldsmith et ah, 2014); and the longitudinal diffusion tensor 


imaging (DTI) study, where modality prohles along well-identihed tracts are observed for 
each multiple sclerosis (MS) patient at several hospital visits (Greven et ah, 2010). As 


a result of an increasing number of such data, longitudinal functional data analysis has 


received much attention recently; see for example 

Morris et al 

. (12003) 

Morris and Garroll 

(2006) 

; Baladandayuthapani et al. 

(2 

008) 

Di et al. 

(2009 

); ( 

dreven et al. 

(2010) 

Staicu 

et al. 

2010) 

Ghen and Muller 

(2012 

; Li and Guan 

(2014 

). 


Our motivation conies from the longitudinal DTI study, where the objective is to study 
the evolution of the MS disease as given by the dynamics of the modality prohles along 
the corpus callosum tract of the brain. By “modality prohle” we mean measurements of 
a particular type of water dihusivity characteristics that are recorded at a hne grid of 
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locations along the tract; the modality we focns on here is fractional anisotropy (FA). 
The change in the FA prohles along the corpns callosnm over hospital visits is informative 
of the progression of the MS disease, and thns a statistical model for FA prohles, which 
incorporates longitndinal dependence, has the potential to be a very nsefnl tool in practice. 
In this paper we propose a modeling framework that captnres the process dynamics over 
time and also provides prediction of a fnll FA trajectory at a fntnre visit. 

Cnrrently available methods for analysis of longitndinal fnnctional data mainly sep¬ 
arate into two categories, based on whether or not they acconnt for the actnal time of 
snbject’s repeated visit, say Tij for the jth visit of snbject i. However, most existing meth¬ 
ods, inclnding those that incorporate Tij, cannot be nsed to predict a complete response 


trajectory at an nnobserved (fntnre) visit. For example, while Greven et ah (2010) nsed 
a fnnctional linear mixed model framework to model the process dynamics linearly in Tij, 
their approach is not able to provide prediction of a fnll trajectory at a fntnre visit. To the 
best of onr knowledge, the only available modeling approach that provides snch prediction 
is Chen and Muller (2012). Specihcally, their proposed model, henceforth denoted CM, 


is Yij{s 


= /i(s|Ty) 'Z.k>iiikiTij)(j)k{s\Tij), where Fy(-) is the jth repeated trajectory 
response measured at time T^, /i(-|Tjj) is the mean response function at T^j, 0fc(-|Tjj) 
is the kth leading direction at Tij, and ^ikiTij) = J{Yij{s\Tij) — fi{s\Tij)}(l)k{s\Tij)ds is 
the corresponding coefficient, which depends on Tij. Though providing future prediction, 
CM makes the understanding of the process dynamics over visit times, Tij, challenging. 
Additionally, the numerical investigation of CM in hnite samples is restricted to the case 
when the sampling design for T’s, at which the functional response is observed, is dense or 
moderately sparse, as well as when the response curves are observed with white noise mea¬ 


surement error that has small magnitude relative to the variation of the response (Chen 


and Muller (2012)). However, an important limitation of CM is its heavy computational 


cost; its rigorous study in diverse practical situations is almost impossible. 

In this paper we focus on the case where the sampling design of T’s is sparse (hence 
sparse longitudinal design) and the sampling design of s’s is dense (hence dense functional 
design). We provide a novel parsimonious modeling framework to (i) easily study the 
longitudinal dynamics of the functional response over visit times, and (ii) predict a full 
trajectory at a future visit. We propose the following model for the response curve F)j(-): 


Fij(s) /i(s,Tjj) -|-Aj(s,Tjj) -|- ejj(s), Xi(^s,Tij) ^ 


( 1 ) 


for s G 5 and Tij G T, where fi{-,Tij) is an unknown smooth mean response corresponding 
to Tij, Xi{-, Tij) is a smooth random deviation from the mean that varies over time Tij, and 
eij{-) is an error process with zero-mean and unknown covariance function. We assume 
that the bivariate processes Xfs are independent and identically distributed (iid), the 
error processes ejj’s are iid and furthermore are independent of Xfs. For identihability 
we require that Xi comprises solely the random deviation that is specihc to the subject; 
any repeated time-specihc deviation is viewed as part of e^. Here {(l>k{s)}k are orthogonal 
basis functions in T^[0,1] and ^ijk = ^ik{Tij) are the corresponding basis coefficients that 
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have zero-mean, are uncorrelated over i, but correlated over j; the notation, is 

to show that this correlation may depend on Tij. We assume that the set of visit times 
of all subjects, {T^j : i,j}, is dense in T. Full model assumptions are given in Section]^ 
The class of model Q is rich and includes many existent models, as we now illustrate, 
(i) If iikiTij) = + TijQi^ik for appropriately dehned random terms Co,iA: cind uiodel 

Q can be represented as in [Greven et ah (2010). (ii) If cov{^ik{T),^ik{T')) = XkPki\T — 
T'\]u) for some unknown variance A^, known correlation function Pfc(■;//) with unknown 
parameter u, and n = 1, model ([^ resembles to Gromenko et ah] ( |2012 ) and Gromenko and 
Kokoszka (2013) for spatially indexed functional data, (hi) If = J2i>i Ciki'ipikiiTij) 

with eigenfunctions 'ipikiiTys and the corresponding coefficients Ofcds, then model ([^ is 
similar to CM when 0fc(-|T) = 0fe(-) for all k and t. 


The model proposed in Chen and Muller (2012) is, in fact, closely related to model 
([^, but with one fundamental difference. Model (jlj) uses a time-invariant basis function, 
{4>k{-)}k, while Chen and Muller (2012) use a time-varying basis function, {4>k{-\Tij)}k- 
This key difference actually leads to the major advantages of the modeling framework pro¬ 
posed in this paper. First, by using a time-invariant basis function, the basis coefficients, 
^ik{Tij), extract the low dimensional features of these massive data with complex depen¬ 
dence structure. The longitudinal dynamics is emphasized only through the time-varying 
coefficients ^ik{Tij) of ([^, and thus this perspective makes the study of the process dy¬ 
namics easier to understand. Second, the estimation approach of CM requires to obtain 
eigenbasis functions {(j)k{-\Tij)}k at each Tij, by employing functional principal compo¬ 
nent analysis (FPCA) multiple times; our method requires to obtain only one set of basis 
functions, {4>k{-)}k- As the set {T^j : i,j} is dense in T, their estimation method in¬ 
volves a rather complex implementation and is computationally burdensome. In contrast, 
our method has computational advantages; the estimation is based on two dimensional 
smoother, which is faster than the three dimensional smoother required by the methods 
of Chen and Muller (2012). 

Using time-invariant basis functions has many appealing advantages. However, se¬ 
lecting a time-invariant basis is nontrivial. One option is to use a pre-specihed basis 
but it brings along the challenging issues of deciding which basis to use, as well as of 
selecting the optimal number of basis functions. Data-driven basis is another option 
though there is no obvious way to select it. Here we propose to determine {4>k{')}k using 
an appropriate marginal covariance operator S(s,s') = {s',T))g(T)dT, where 

c((s,T), {s',T')) is the covariance function of the process, X{s,T), and g{T) is the sam¬ 
pling density of T’s. In Sectionj^we show that the proposed basis, {(j)k{-)}k, has optimal 
properties with respect to some appropriately dehned criterion. From this view point, the 
model representation ([^ is optimal. The idea of using the eigenbasis of the pooled co- 
variance has been discussed recently in Jiang and Wang| (2010) and Pomann et ah (2013) 
for the case of independent functional data. 

The rest of paper is organized as follows. Section introduces the proposed model¬ 
ing framework. Section describes the estimation methods and implementation. The 
methodology is studied theoretically in Section and then numerically in Section Sec- 
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tion discusses the application to the tractography DTI data. The paper concludes with 
a brief discussion in Section 0 


2 Modeling longitudinal functional data 


Let [{Tjj, Yij{sr) : r = 1,..., R} : j = 1,..., rrii, ] be the data for the ith subject, where 
Yij{-) is the jth profile recorded at visit time, Rj for subject i, and each profile is observed 
at the fine grid of points {si,..., sr}. For notation simplicity we use the generic index s 
instead of and refer to s by ‘functional argument’. We assume that, for each subject 
i, the number of observed curves, m*, is relatively small, and the visit times, {Tij}j are 
random in a closed and compact interval, T. It is assumed that the set of time points of 
all subjects, {Rj : for all i,j}, is dense in T; we call the generic time T by ‘longitudinal 
argument’. Without loss of generality, we set 5 = T = [0,1]. 

Now consider model Q where our primary goal is to model the complex covariance 
function of the random process Xi{s,Tij) = Xij{s); we use this notation interchangeably. 
In our exposition it is helpful to represent the noise process eij{-) as the sum of two 
well identified components: one square integrable smooth component ei^ij{s), which has 
a covariance function r(s, s') = cov{ei_ij(s), ei^ij(s')} that varies smoothly over s, s' G [0,1] 
and a white noise component e 2 ,ij(s) with covariance cov{e 2 ,ij(s), e 2 ,ij(s')} = cr^ if s = s' and 
0 otherwise; only the component ei^jj(s) is relevant for the nontrivial process dependence. 

Let c((s,T), (s', T')) = E[Xj(s, T)Xi(s',T')] be the covariance function of the pro¬ 
cess Xi{-, ■) and g{-) be the sampling density of T. Define the marginal covariance 
function induced by Xj’s by S(s,s') = / c{{s,T), {s',T))g{T)dT] in Section 0 we show 
that S(s,s') is a proper covariance function ( [Horvath and Kokoszka 2012[ ). Denote by 
Wi{s,Tij) = Xi{s,Tij) + ei^ij{s); Wi is a bivariate random process defined on [0,1] x |0.1| 
and its induced marginal covariance is S(s, s') = E(s,s') -|-r(s,s'). Let {(l>k{s), Xk}k be 
the pair of eigenfunctions and eigenvalues from the spectral decomposition of H(s,s'), 
where {4>k{-) '■ k} forms an orthogonal basis in L^[0,1] and Ai > A 2 > ... > 0. Using 
arguments similar to the standard FPCA, the eigenbasis functions {4>k{') '■ k = 1,..., K} 
are optimal in the sense that they minimize the following weighted mean square error: 
MSE («,> fg(T)dT, where 
</i(-)./2(') >= Jo fii^)f 2 {-s)ds is the usual inner product in L^[0,1]. 

Using the orthogonal basis in L^[0,1] {4>k{-)}k, we can represent the square integrable 
smooth process Wi{-,T) as Wi{s,Tij) = ^w,ijk4>k{s), where 


^W,ijk J fUj(s, Tjj)0fc(s)cis ^ik{'Yij') eijkt (2) 

and ^w,ijk are not necessarily uncorrelated over k. Here iik{Tij) = / Xi{s,Tij)(j)k{s)ds and 
Cijk = J ei^ij{s)(j)k{s)ds are specified by the definition of Wp, for fixed k these terms are 
mutually independent due to the independence of the processes Xj and ei^^. For each fc, 
one can easily show that, ^ik{-) is a smooth zero-mean random process in L^[0,1] and is 
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iid over i; by an abuse of notation in what follows we use interchangeably ^ik{Tij) = ^ijk- 
Furthermore Cijk are zero-mean iid random variables over i,j-, denote by their hnite 
variance. The above representation allows us to recover the latent signal Xi{-,Tij) and 
error as Xi{s,Tij) = Y.T=i^ikiTij)(j)k{s) and 6 i, 2 j(g) = eijk(f>k{s) 


The main difference between the proposed work and Chen and Muller (2012) is the use 
of the marginal covariance function, S(s,s'); CM use the conditional covariance function 
S(s, s'|T). The proposed approach reduces computational burden substantially, by avoid¬ 
ing the three dimensional covariance function and the spectral decomposition of S(s, s'|T) 
at every T. Also using time-invariant basis functions is critical in reducing the dimension¬ 
ality of the data in a way that captures the dependence over visit time; the longitudinal 
dependence (over T) can be studied through ^ijk = ^ifc(Tjj)’s. 

One way to model the dependence of the basis coefficients, ^ijk, is by using common 
techniques in longitudinal data analysis; for example by assuming a parametric covariance 
structure. As we discusse d in Section [I| this leads to models similar to Greven et ah (2010); 
Gromenko et al. (2012); Gromenko and Kokoszka (2013). We consider this approach in 


the analysis of the DTI data. Section Another approach is to assume a nonparametric 
covariance structure and employ a common functional data analysis technique. We detail 
the latter approach in this section. 

Specihcally, for each k > 1 denote by Gk{T,T') = cov{^ik{T),^ik{T')} the smooth 
covariance function in [0,1] x [0,1]. Mercer’s theorem provides the following convenient 
spectral decomposition Gk(T,T') = Y,i>iVki'ipki(T)'iljki(T'), where > hfc 2 > • • • > 0 and 
{'ipki{')}i>i is an orthogonal basis in T^[0,1]. Using the Karhunen-Loeve (KL) expansion, 
we represent ^ik{-) as: 


iik{Tij) = ''^Ciki'4’ki{Tij), Ciki = / iik{T)il)ki{T)dT, 


(3) 


i=i 


where Qki have zero-mean, variance equal to rjku and are uncorrelated over 1. 

By collecting all the components, we represent the model ([^ as = fi{s,Tij) + 

Y^T =1 Cikii^ki{Tij)(t)k{s) + eij(s), for eij{s) = ^1^=1 (^ijk4>k{s) + e 2 ,ij{s). In practice we 
truncate this expansion. Let K and L^’s such that Yij{s) is well approximated by the 
following truncated model based on the leading K and Lk respective basis functions 


K Lk 

Yi{s, Tij) = /i(s, Tij) -\- EE 

k=\ 1=1 

where eij{s) ~ J2k=i^ijk^k{s) + e 2 ,ij(s). The truncated model (j^ gives a parsimonious 
representation of the complex dependent longitudinal functional data; it allows the study 
of its dependence through two sets of eigenfunctions: one dependent only on s and one 
only on visit times, T^j. This approach involves two main challenges: hrst, determining 
consistent estimators of the orthogonal basis functions 0 fc(-)’s, and second estimating the 
covariance functions Gk{-,-) of tho time-varying coefficients ^ik{T), when the quantities 
lUj(s,Tjj)’s are not directly observable. 
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3 Estimation of model components 


Next we discuss estimation of all model components: Section 3.1 describes the estimation 
of the mean function. Section 0 presents the estimation of the marginal covariance 
function and of the eigenfunctions 0fc(-)’s, and Section [33 presents the estimation of the 
time-varying basis coefficients Prediction of Yi{s,T) is detailed in Section 3.4 


3.1 Step 1: Mean function p(s, T) 

In this paper we estimate the mean function, /i(s,T), using bivariate smoothing via bi¬ 
variate tensor product splines (Wood, 2006) of the pooled data {Yijr = Yij{sr) : i = 


1,..., n, j = 1,..., ruj and r = 1,..., i?}. Consider two univariate B-spline bases, {5^,1 (s), 
■ ■ ■, Bs,d,{s)} and {Bt,i{T), ..., BT,dj,iT)}, dehned on 5 = [0,1] and T = [0,1], re¬ 
spectively, where dg and dr are their respective dimensions. The mean surface is rep¬ 
resented as a linear combination of a tensor product of the two univariate B-spline 
bases /i(s,T) = Eqi=i EJ=i ^.,gi(s)5T,g2(^)/5giq2 = B(s,T)^/ 3, where B(s,T) is the 
known dsd^-dimensional vector of and (3 is the vector of unknown 

parameters, (dq^q 2 ?>. The bases dimensions, dg and dr, are set to be sufficiently large 
to accommodate the complexity of the true mean function, and roughness of the func¬ 
tion is controlled through the size of the curvature in each direction separately, i.e. 
JJT)/ds^YdTds = /3^(Ps®IdT)/3 in direction s, and //{9^/i(s, T)/dT'^}‘^dTds = 
/3^(Ids ® Pt)/ 3 in T. The penalized criterion to be minimized is 


[Yijr - B(s,., Tij)^f3] ^ + (3'^{XgPs ® Idx + -^rld, ® Pt)/3, 


(5) 




where A* and At are smoothing parameters that control the trade-off between the smooth¬ 
ness of the £t and the goodness of fit. The smoothing parameters can be selected by the 
restricted maximum likelihood (REML) or generalized cross-validation (GCV). It follows 
that the estimated mean function p(s,T) = B(s,T)^/3. 

While this method is a very popular smoothing technique of bivariate data, other avail¬ 
able bivariate smoothers can be used to estimate the mean /i(s,T); for example, kernel- 


based local linear smoother (Hastie et al., 2009), bivariate penalized spline smoother 


(Marx and Eilers, 2005) and the sandwich smoother (Xiao et ah, 2013). The sandwich 


smoother (Xiao et al., 2013) is especially useful in the case of very high dimensional data 


for its appealing computational efficiency, in addition to its estimation accuracy. 


3.2 Step 2: Marginal covariance. Data-driven orthogonal basis 

Once the mean function is estimated, let Yijr = Yijr — '[l{sr-, Tij) be the demeaned data. We 
use the demeaned data to estimate the marginal covariance function induced by IW(s, Tij), 
H(s, s') = S(s, s') -f r(s, s'). The estimation of H(s, s') consists of two steps. In the hrst 
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step, a raw covariance estimator S(s, s') is obtained; the pooled sample covariance is a 
snitable choice, if all the curves are observed on the same grid of points: 

n rrii n 

I—^(Sj., Sj./) ^ ^ ^ ^ ^ ^ • (6) 

i=l j=l i=l 


Because we assume that each observation, Yijr, is observed with white noise, e 2 ^ij{sr), the 
‘diagonal’ elements of the sample covariance, S(sr., Sr), are inflated by the variance of the 
noise, In the second step, the preliminary covariance estimator is smoothed by ignoring 
the ‘diagonal’ terms; see also Staniswalis and Lee (1998) and Yao et ah (2005)) who used 
similar technique for the case of independent functional data. In our simulation and data 
application we use the sandwich smoother (Xiao et ah, 2013). To ensure the positive 


semi-definiteness of the estimator any negative eigenvalues are zero-ed. The resulti^ 
smoothed covariance function, S(s,s'), is used as an estimator of S(s,s'). In Section]^ 
we show that S(s,s') is an unbiased and consistent estimator of S(s,s') in two settings: 
1) the data are observed fully and without measurement errors, i.e. eij{s) = 0 and 2) the 
data are observed fully and with measurement error of type ei^ij{s), i.e eij{s) = ei^ij{s). 

Let {0fc(s), Afcjfc be the pairs of eigenvalues/eigenfunctions obtained from the spectral 
decomposition of the estimated covariance function, S(s,s'). The truncation value K is 
determined based on pre-specified percentage of variance explained (PVE); specifically, 
K can be chosen as the smallest integer such that ( Yl!k=i is greater than 


the pre-specified PVE (Di et ah, 2009, Staicu et ah, 2010). 


3.3 Step 3: Covariance modeling of the longitudinal varying components 

Let ^w,ijk = J Yij{s)(j)k{s)ds be the projection of the jth repeated demeaned curve of the 
fth subject onto the direction 0fc(-) for k = 1,K. Since Vjj(s), is observed at dense 
grids of points {sr : r = 1,..., i?} in 5 for alH and j, ^w,ijk will be approximated accu¬ 
rately through numerical integration. It is easy to see that the version of ^w,ijk that uses 
fi{s,Tij) in place of and (f>k{s) in place of (f>k{s) converges to ^w,ijk with prob¬ 

ability one, as R diverges. The time-varying coefficients ^w,ijk are proxy measurements 
of ^ijk = iikiTij) and will be used to study the process dynamics. Estimation of their 
covariance is important as it describes the process temporal dependence but also because 
it allows prediction at unobserved time points T G [0,1]. One alternative is to stack the 
coefficients S,w,ijk over k and study the time-dependence of the resulting iP-dimensional 
vector. We take a simpler way and study the temporal dynamics separately for each k. 

Let {(Tij,iCw,ijfc) : 3 = be the ‘observed data’ that will be used to esti¬ 

mate the temporal covariance. A possible approach is to assume a parametric model 
for the latent temporal covariance, such as AR(1) or a random effects model framework; 
such approach would be preferable when there are very few curve measurements per 
subject and the longitudinal design is balanced. We discuss random effects model for 
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estimating the longitudinal covariance in the data application. Here we consider a more 
flexible approach and estimate the covariance nonparametrically. Speciflcally, recall that 
Gk(T,T') = cov{^ik(T),^ik(T')} and our framework as described in Q is common for 
modeling discretely observed functional data: ^w,ijk = Yll^iCiki'^kiiTij) + Cijk where Qki 
are random variables with zero mean and r]ki variances, {‘ipki{T),'r]ki}i are the pairs of 
eigenfunctions and eigenvalues of the covariance Gk, and CijkS are iid with zero-mean and 
variance equal to cXg^. Though we do not observe ^w,ijkS we use the proxy iw,ijk^ and 


estimate Gk by employing FPCA techniques for sparse functional data (Yao et ah, 2005). 
For completeness we summarize the approach below. Following Yao et ah (2005), 


we first obtain the raw sample covariance, Gik{Tij,Tiji) = ^w,ijk^w,ijk- Then the esti¬ 
mated smooth covariance surface, Gk{T,T'), is obtained by using bivariate smoothing 


of {(Tij,Tij:),Gik(Tij,Tij/) : i,j ^ j'}. Kernel-based local linear smoothing (Yao et al. 


2005) or penalized tensor product spline smoothing (Wood, 2006) can be used at this step. 


The diagonal terms T^/) : i,j = f} are removed because the noise Cijk present 

in the proxy ^w,ijk leads to inflated variance function. Let {iJki{,-),Vki}i be the pairs of 
eigenvalues/eigenfunctions of the estimated covariance surface, Gk{T, T'). The truncation 


value, Lk, is determined based on pre-specifled PVE; using similar ideas as in Section 3.2 


The variance is estimated from the difference between the variance along the diago¬ 
nal, or a smooth e stimator o f the r aw diagonal terms, and the predicted covariance for 


T = r, Gfc(T,T); 


Yao et al. 


(2005) discusses an alternative that dismisses the terms at 

Once the basis functions {V’fcKOItn 


the boundary when estimating the error variance, 
eigenvalues rjki^s, and error variance are estimated, the above model framework can 
be viewed as a mixed effects model and the random components Qki can be predicted 
using conditional expectation and a jointly Gaussian assumption for CpTe’s and CijkS. In 

particular, Qki = HCiki\lw,ik] = '^w,ik^ where = {^kiiTu ),.. .,^ki{Tim,)V 

is the mj-dimensional column vector of the evaluations of V’fcz(') at {Tjj : j = 1,... ,mi}, 
is a m* X rrij - matrix with (j, j')th element equal to GkiTij, Tif) + '^‘lk^jj'-i 1 

of j = i' and 0 otherwise, and ^.wik tbe rrii dimensional column vector of iw,ijk^- The 
predicted time-varying coefficients corresponding to a generic time T are obtained as 


lk{T) = Y,liki^ki{T). (7) 

i=i 

) proved the consistency of the estimators of the model components and 
trajectories when iw,ijk^ are observed directly. In Section]^ we extend 
these results to the case when the proxy ^w,ijk^ are used instead, and when the observed 
curves hij(-) are fully observed and the measurement error is of the type eij{s) = ei^ij(s); 
i.e. the data are observed with smooth error. 


Yao et al. (2005 


the predicted score 


























3.4 Step 4: Trajectories reconstruction 

We are now able to predict the full response curve at any time point T m[0,1] by: 

K 

Yi{s,T) =^{s,T)+ ^Zk{T)^k{s), se[0,1]; (8) 

k=l 


where fifc(T) = Ya=i Qki^kiiT). 


3.5 Implementation using available softwares 


An important advantage of the proposed approach is that its implementation can be 
carried using available software. 


Step 1. Estimate the smooth mean function /r(s, T) using the sandwich smoother ( |Xiao 
et al.| 2013) (the fbps function in R (R Core Team, 2014) package refund (Ciprian 


Crainiceanu et ah, 2014)) or using the penalized tensor product spline smoothing 


(the gam and te functions in R (R Core Team, 2014) package mgcv (Wood, 2011)). 


Step 2. Estimate the smooth covariance function S(s, s') with the demeaned data (dense 


design) using the sandwich smoother (Xiao et ah, 2013) and g et the eigenfunctions 
0fc(s) using the fpea.face function in the refund package (Ciprian Crainiceanu 


et al. 


2014). The default option of this function also provides ^w,ijkS- 


Step 3. For each k, carry out FPCA of {Tij,^w,ijk '■ hj}- There are several available options 
for implementation; fpca.se function in the refund package (Ciprian Crainiceanu 


et ah, 2014) and fpca.mle and fpca.pred functions in the FPCA package (Peng and 


Paul 


2009 


(Yao et al. 


James et ah, 2000) in R. Alternatively one can use the FPCA function 


2005) in the MATLAB (MATLAB, 2014) package PACE (Yao et ah, 2005) 


available at http: //www. stat. uedavis . edu/PACE/. 
Step 4. Determine the predicted trajectories using equation 


4 Theoretical properties 

We discuss now the asymptotic properties of the model components estimators and the 
predicted trajectories. Our study is based on the assumption of a sparse longitudinal 
design (modest number of repeated curve measurements per subject) and dense func¬ 
tional design; this requires new techniques than the ones commonly used for theoretical 


investigation of repeated functional data (such as Chen and Muller (2012)) 


Throughout this section we assume that response trajectories, Yjj(-)’s, have zero-mean 


and are observed fully as a function over the domain, S = [0,1]. Section 4.1 discusses the 


main theoretical results when data, l^j(s)’s, are observed without error, i.e. 


eij[s, 


= 0 
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for s G [0,1]. Section 4.2 extends the findings to the case when the observed data are 
corrnpted with a smooth error process eij{s) = i.e. there is no white noise 

measnrement error. The proofs of the results are sketched in the Appendix and described 


to greater detail in the Supplementary Material. Section [4^ discusses on how to relax some 
of the assumptions. For clarity, throughout this section we use S and T to distinguish 
between the domains for the functional argument and longitudinal one, respectively. 

We assume that the bivariate process Xi{s,Tij) is a realization of a true random 
process, X{s,T), with zero-mean and smooth covariance function, c{{s,T), {s',T')) = 
cov{X{s,T), X{s',T')}, which furthermore satisfies some regularity conditions. 

(Al.) X = {X(s,T) : (s,T) G iS X T} is a square integrable element of the L‘^{S x T), 
i.e. E[JJ X'^{s,T)dsdT] < oo, where S and T are compact sets. 

(A2.) The sampling density g{T) is continuous and suprp^j. \g{T)\ < oo. 

We first verify that the marginal covariance, S(s,s'), is a proper covariance func¬ 
tion (Horvath and Kokoszka, 2012, p.24). Under the assumptions (Al.) and (A2.), the 


marginal covariance function, E(s,s') = J c{{s,T), {s',T)}g{T)dT, (i) is symmetric, (ii) 
is positive definite, and (iii) have eigenvalues satisfying that YlT=i is finite. 


4.1 Response curves measured without error 

Assume that there is no error eij{s) = 0 and thus Yij{s) = Xi{s,Tij) for s G 5. Then 
the sample covariance of Yij{s) is S(s, s') = / (Sr=i 

show that S(s,s') is an unbiased estimator of the true marginal covariance, S(s,s), and 
then we prove that it is a consistent estimator. The following assumptions are important 
in our theoretical development. 


(A3.) E[X{s,T)X{s',T)X{s,T')X{s',T')] < oo for arbitrary s, s' G 5 and T,T' eT . 
(A4.) E[||X(-,T)f] < oo for each T G T. 


Conditions |(A3.) and (A4.) regard the moment behavior of the latent process X and 


are commonly used in functional data analysis (Yao et ah, 2005, Chen and Muller, 2012). 


Theorem gives the asymptotic properties of the marginal covariance estimator E(s,s'). 


Theorem 1 Assume (Al.) - (A3.) hold. Then we have that fo r each (s,s'), |S(s,s') — 
S(s,s')| 0 as n diverges. If the additional assumption (Af.) holds, then we have that 




0 as n 


oo. 


(9) 


where ||A:(-,-)||s = {// k‘^{s, s')dsds'}^/'^ is the Hilbert-Schmidt norm ofk{-,-). 
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By the consistency resnlt ([^ and Theorem 4.4 and Lemma 4.3 of Bosq (2000, p.l04), 
it follows that the eigen-elements of S(s, s') are consistent estimators of the corresponding 
eigen-elements of S(s, s'), if the eigenvalnes are neither crossing nor the same. 


(A5.) Let Ofc = (Ai — A 2 ) for A; = 1 and = max[(Afc-i — Xk), (Afc — A^+i))] otherwise, 
where Xk is the kth largest eigenvalnes of S(s,s'). Assnme that 0 < 0 ^ < 00 and 
Afc > 0 for all k (No crossing or ties among eigenvalnes). 


Corollary 1 Under the assumptions (Al.)\(A2.) , (A4-.), and (A5.), for each k 


\Xk - Xk\ 0, and \\^k{-) - 0fc(- 


0 as n diverges. 


( 10 ) 


Next, we focns on the estimation of the covariance that describes the longitndinal 
dynamics. If fw,ijkS were available to estimate the covariance fnnction, Gk{Tij,Tiji) = 
cov{^ik{Tij), ^ik{Tiji)), then the consistency of the model components follows trivially from 
the FPCA properties developed in Yao et ah (2005). However, are not directly ob¬ 

served; instead available are the proxy time-varying coefficients ^w,ijk = f Ai(s, Tij)(l){s)ds. 
We first show the nniform consistency of ^w,ijk, then nse this resnlt to show that the es¬ 
timator of Gk(Tij,Tij/) based on ^w,ijk is asymptotically identical to that based on ^w,ijk- 


Consis tency resnlts of the remaining model components follow directly from Yao et ah 


(2005). The nniform consistency of ^ijk reqnires snp^ g|Li(s, Tjj)| to be bonnded almost 

The Ganssian assnmption 


snrely, and this is ensnred by assnming (A6.( 


for the consistency of Qki obtained nsing the PACE method of 


Yao et al. 


(A8.) is needed 


(2005). 


(A6.) E[snp^-p|X(s,T)|“] < for a constant, M > 0, and an arbitrary integer, a > 1; 
This is eqnivalent to assnme that X{s,T) is absolntely bonnded almost snrely. 

(AT.) Let bki = {r]ki-'nk2) for / = 1 and bki = max[{r]k(i-i)-T]ki), (Vki-Vkii+i)))] otherwise, 
where pki is the Zth largest eigenvalnes of Gk{T,T'). Assnme that 0 < 6^/ < 00 and 
Tjki > 0 for all k and I (No crossing or ties among eigenvalnes). 

(A8.) Qiki and eijk are jointly Ganssian. 


Theorem 2 Under the assumptions (Al.), (A2.), \(AJi..) - (A6.), for each k 


j\f,W,ijk f,W,ijk\ t 0 


( 11 ) 


and 

||G,(-,-)-G,(-,-)||s Ao (12) 

as n diverges. In fact a stronger result also holds, namely the uniform convergence of 
Gk{T,T'). Specifically, supj,\Gk{T,T') — Gk{T,T')\ A- 0 as n diverges. 
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Corollary 2 Under the assumptions (Al. - for each k and I the 

eigenvalues pki and eigenfunctionsofGk{-,-) satisfy 


\fiu - >tal 0, and 0 


(13) 


as n diverges. Uniform convergence of also holds: suprp\^ki{T) — i/jki{T)\ 0. 

Furthermore, as n diverges, the estimator of the error variance, and the functional 

principal component scores, Qki’s, satisfy 


- Kk\ 0 \Cikl - Cikll 0 , 


(14) 


where = E[Ciki\$,w^Wik mi-dimensional column vector of ’s. 


The consistency results for all model components imply consistency of the predicted 
trajectories given in equation 


Conjecture 1 Under the assumptions for each (s, T) G 

SxT 


Cikl‘f’kl{T)4>k{s), 


(15) 


k=l 1=1 


as n, K and ’s diverge. 


4.2 Response curves measured with smooth error 

Assume next that the response curves are observed with smooth error eij{s) = ei^ij{s), 
and thus Yij{s) = Xi{s,Tij) + ei,jj(s) for s G 5 and smooth error process ei,ij(-) G L^{S). 

The main difference from Section 4.1 is that the sample covariance of Yij{s) is an 
estimator of S(s,s') = E(s,s') + r(s,s'), not of S(s,s'); as a result we denote the sam¬ 
ple covariance of Yij{s) by S(s, s') = X)r=i ^p('5)b)j(s')/ (Z)r=i ^an show, 

using similar arguments as earlier, that S(s, s') is an unbiased estimator of the marginal 
covariance function, S(s, s'). Moreover similar arguments can be used to show the point- 
wise consistency as well as the Hilbert-Schmidt norm consistency of S(s, s'). 

(A9.) Assume eij{s) is realization of e = {e(s) : s G 5}, which is square integrable process 
in recall ejj(s) = ei,jj(s) e 2 ,ij{s). 

(AlO.) E0|6(-)r] <'^- 

(All.) E[sup^|e(s)|“] < for a constant, M > 0, and an arbitrary integer, a > 1; this is 
equivalent to assume that e(s) is absolutely bounded almost surely. 


Corollary 3 Under the assumptions (Al.) - (A3.), and (A9J_, for each (s , s'), |E ( s, s' 


\s,s 


(AlO.) 


0 as n diverges. And under the assumptions (Al.), (A2.). (Af.), (A9.) 


0 and supAfy/^ijk — (,w,ijk\ —t 0 as n —> oo. 
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The proofs of these results are detailed in the Supplementary Material. 

As the smooth error process ei^ij{s) is correlated only along the functional argument, 
s, and are iid over i,j it follows that the theoretical properties of the predictions 

- of the time-varying coefficients and the response curve - hold without any modification. 


4.3 Extensions 


The theoretical results presented in this section are based on the assumption that data 
l^j(s)’s are observed (i) fully, (ii) without white noise, e 2 ,p(s) = 0 for all s, and (hi) have 
mean zero. These assumptions are made for convenience, and they can be relaxed as we 
now explain. 

(i) The assumption that l^j(s)’s are observed fully, as a continuous function, is quite 


common in theoretical study involving functional data; see for example Cardot et ah 
(2003, 2004); Chen and Muller (2012) among many others. One possibility to bypass 
this assumption is to use the corresponding smooth trajectories instead, (ii) Suppose 
that the profiles yij(-)’s are observed on dense grids of points and the measurements are 


additionally corrupted with white noise. Zhang and Chen (2007) showed that by smooth¬ 


ing each profile using local linear smoother, the true de-noised curves are recovered with 
asymptotically negligible error, in the case of independent curves. Another possibility 
to handle white noise is to use ideas similar to Yao et al. (2005). In Section]^ we illus¬ 
trate numerically the effect of white noise on the performance accuracy, (iii) Finally, the 
theoretical properties of the model components estimators remain valid, when the mean 


function is non-zero and a consistent mean estimator is available; Chen and Muller (2012) 


had considered this problem and showed that under suitable assumptions such consistent 
mean estimator can be obtained using bivariate smoothing. 


5 Simulation study 


We conduct a simulation study to investigate the finite sample performance of the pro¬ 
posed modeling approach. The performance is evaluated in terms of estimation accuracy 
for the main model components, in-sample and out-of-sample prediction accuracy, and 
computational time. Whenever applicable we compare our results to available alterna¬ 


tives such as Chen and Muller (2012). 


We generate Nsim = 1000 samples from model ([^ with K = 2, Yij{s) = ^{s,Tij) + 
6i(^q)0i(s)+ 62 ( 7 ij) 02 (s)+ eq(s), where p(s,T) = l + 2s + 3T+ 4:sT, and 0i(s) = 1 and 
(j) 2 {s) = v^sin(27rs). The grid of points for s is the set of 101 equispaced points in [0,1]. 
For each i, there are m* profiles associated with visit times, {Tij : j = 1,..., mj}; are 
randomly sampled from 41 equally spaced points in [0,1]. The random coefficients ^ik{T) 
are generated from various covariance structures as detailed below. Errors are generated 
from eij{s) = eiji0i(s) -1- eij 2 (j) 2 {s) + e 2 ,ij{s), where e^i, eij 2 and e 2 ,q(s) are mutually 


independent with zero-mean and variances equal to a; 


e,lWe,2 


and (T^, respectively. The 
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( 16 ) 


white noise variance, cr^, is set based on the signal to noise ratio (SNR), 


SNR 


ff var{Yi(s,T)}dsdT 

+ (^ 1,2 + 


We consider the following experimental factors: 


Case 1. covariance structure of the time-varying components: 

(a) non-parametric covariance (NP): iik{T) = Cikii’kiiT) + Cik2'4’k2{T), where 

(i) = y2cos(27rT), V’i 2 (T) = v^sin(27rT), On ~ N(0, 3), O12 ~ N(0,1.5); 

(ii) 02 i(T) = V2cos(47rT), 022(T) = y2sin(47rT), O21 ~ N(0, 2), O22 ~ N(0,1). 

(b) random effects model (REM): ^ik{T) = Ofco + with 



2.5 2 
2 3 



1 

1.5 



(c) exponential autocorrelation model (Exp): ^ik{T) is a Gaussian process with 
mean zero, variance Xk and auto-correlation function corr{0fc(7"), Ofc(^0} = P^k ^ ' 
denoted by GP{Xk,Pk)- We set Oi(T) ~ GP(4.5,0.9) and 02(T) ~ GP(3,0.5). 

Note that regardless of the generating models for ^ikiT), we have that J var{Ofc(P)}dT 
is equal to 4.5 and 3 for k = 1,2 respectively. 


Case 2. number of repeated measurements per subject: 

(a) TUi ~ Uniform({8, 9,..., 12}) (about 75% missing) 

(b) rrii ~ Uniform({15,16,..., 20}) (about 55% missing) 

Case 3. variance of eijk'- 

(^) (*^ 6,1 We, 2 ) = (0,0) (white noise only, i.e. eij{s) ~ iV(0,(T^)) 
(b) (c^e,i,t^e,2) = (0.7,0.3). 


The simulation results for the Case 3.(a), i.e. no smooth error, are included in 
the Supplementary Material. 

Case 4. signal to noise ratio: (a) SNR = 1 (cx^ = 6.5), and (b) SNR = 5 {a^ = 0.5) 

Case 5. number of subjects: (a) n = 100, (b) n = 300, and (c) n = 500 


For each generated sample of size n we form a training set and a test set. To determine 
the test set we randomly select 10 subjects from the sample. The test set is formed by 
collecting these subjects’ last functional observation; hence the test set contains 10 curves. 
The remaining functional observations for the 10 subjects and the data corresponding to 
the remaining subjects in the sample form the training set. Our model is fitted using the 
training set and the methods outlined in Section To be more specific, the bivariate 
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mean function, /i(s, T), is modeled using 50 cubic spline basis functions obtained from the 
tensor product of dg = 10 basis functions in direction s and dx = 5 in T. The smoothing 
parameters are selected via REML. The finite truncations K and are all estimated 
using the pre-specified level PVE = 0.95. 

Estimation accuracy for the model components is evaluated using integrated mean 
squared errors (IMSE): specifically, for the bivariate mean function IMSE(/i) = 

(s,T) —fi{s,T)}‘^dsdT/Nsim, and for the univariate eigenfunctions IMSE(0 a;) = 
~ 4>k{s)}^ds/Nsim, k = 1,2. The prediction performance is assessed 
through the accuracy in predicting the time-varying model components, ^ik{T), and 
in predicting the response curve, Yi{s,T). For the former assessment we use the in- 
sample integrated prediction errors (IPE) defined as IPE(^fc) = I ~ 

Cik'^{T)}‘^dT/ (nNsim), k = 1,2. For the later assessment we use the in-sample IPE 


^sim 

(IN-IPE) defined as IN-IPE(K) = E.sr.i E? Efdi- y.-“"(s)}^<is/(A'„„ 
where y*j{s) is the true signal, i.e. without measurement error eij{s). Also we 

use the out-of-sample IPE (OUT-IPE) defined as OUT-IPE(y) = Efci Eiet...... 

^ SZTTt 

I K I > / / K / I I I I / V . . I >H I I I I I . . 

U 


R- 


iStest set 

irrii v'S)}^(is/(10./Vsjm) and y^jis) is the true signal in the test set. The results are based 
on Nsim = 1000 simulations. 

In terms of estimation performance and prediction of ^ik{T) there is no alternative 
approach. On the other hand, in terms of model prediction error and prediction of a 
subject’s future curve there are two possible alternatives. One is the CM model of Chen 


stm I 
imj ' 


and Muller (2012). However due to the high computational expense required by their 
method, we have to restrict our comparison to few scenarios only: nii {8,...,12} 
number of repeated curves per subject. Case 3(b), and SNR = 1. The approach of 


Chen and Muller (2012) requires specification of several kernel bandwidths; due to the 


increased computation burden we use the pre-specified bandwidth h = 0.1 in smoothing 
both the mean and covariance functions. Even with these adjustments there is an order 


of magnitude difference in the computational cost (when n = 100 the method of Chen 


and Muller (2012) takes approximately 984 seconds, while our approach takes about 7 


seconds). As well, we also used the pre-specified level PVE = 0.95 to be consistent with 
our approach. A second alternative approach for prediction of a subject’s future visit 
trajectory is a rather naive approach; let the future prediction equal the average of all 
previously observed profiles for that subject. For example, the naive predictor of a profile 
of some subject in the test set is equal to the average of all profiles available in the training 
set for the corresponding subject. 

Table shows the results for different covariance models for ^ik{T), different num¬ 
ber of repeated curve measurements per subject, different SNRs, complex error process, 
and varying sample sizes; basically the results for Case 1 (a)-(c). Case 2(a)-(b), Case 3 
(b). Case 4 (a)-(b), and Case 5 (a)-(c). The analogous results corresponding to trivial 
covariance structure of the error process (white noise) are included in Table SI of the 
Supplementary Material. The performance of the proposed estimation (see columns for 
yU, 01 , and 02 of this table) is slightly affected by the covariance structure of model com- 
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ponents describing the dynamic behavior, ^jfc(T)’s, and the nnmber of repeated cnrve 
measnrements per subject, but in general is quite robust to the factors we investigated. 
As expected the estimation accuracy improves with larger sample size; see the 3x3 top 
left block of IMSE results corresponding to n = 100, n = 200, and n = 500. The results 
corresponding to white noise measurement error are consistent with these observations. 

To describe the prediction performance we consider both the prediction of ^ik{T)^s and 
the prediction of the curve responses; consider columns labeled .^i, .^25 IN-IPE and OUT- 
IPE of Table As expected the underlying covariance structure of ^ifc(T)’s does affect 
the prediction accuracy. In our investigation it seems that the exponential covariance 
structure (Exp) is most challenging; this most likely is due to the (very) large correlation 
coefficients used pi = 0.9 and p 2 = 0.5, which result in high temporal dependence even 
for the observations that are furthest apart T = 0 and T = 1, as T G [0,1]. Increasing the 
level of signal relative to the magnitude of noise (SNR) does improve the results somewhat. 
More importantly, when the error process has trivial covariance structure, the prediction 
accuracy is greatly improved. The results also show an interesting finding: increasing 
the number of repeated curve measurements rrii has a greater effect on the accuracy than 
increasing the sample size n. This observation should not be surprising, as with larger 
number of repeated measurements the estimation of the covariance of the longitudinal 
process ^jfc(T)’s improves and as a result superior prediction. 

Here is a summary of the comparison between the proposed method and available 
alternatives. As already pointed out, the comparison with CM is limited by the com¬ 
putational expense involved. In interest of space the results are presented in Table S4 
of the Supplementary Material: they show that the prediction using CM is more sensi¬ 
tive to the covariance structure of the underlying time-varying coefficients and that the 
accuracy can be improved by up to 50% by the proposed approach. As expected, the 
naive approach (results presented in the columns labeled IN-IPEnaive and OUT-IPEnaive 
of Table is very sensitive to the covariance structure of the latent longitudinal com¬ 
ponents, ^ik{T). When the correlation structure is simple (random effects model, REM, 
and exponential dependence, Exp) it yields much better results compared to when the 
correlation is more complex (nonparametric, NP). Not surprisingly, in all the cases stud¬ 
ied the prediction accuracy is inferior to the proposed method. Table provides insights 
into the how the computational time of our method scales with the number of subjects; 
for completeness the computational time is studied for all the cases in Tables S2 and S3 
of the Supplementary Material. 

The overall conclusion is that the proposed approach provides an improved prediction 
performance over the existing methods in a computationally efficient manner. 

6 DTI application 

Diffusion tensor imaging (DTI) is a magnetic resonance imaging technique, which is crucial 
in the diagnosis and progression assessment of various neuro-pathological diseases that 
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Table 1; Estimation and prediction accuracy results based on = 1000 simulations 






rrii ~ 

{8...., 

12} and SNR — 

1 





4>i 

4>2 

6 

6 

IN-IPE 

IN-IPE„,™ 

OUT-IPE 

OUT-IPE„ai„ 

NP (a) 

n = 100 

0.092 

0.003 

0.011 

0.338 

0.224 

0.406 

7.790 

0.988 

11.478 


n = 300 

0.031 

0.001 

0.009 

0.226 

0.138 

0.313 

7.773 

0.559 

11.349 


n = 500 

0.019 

0.001 

0.009 

0.199 

0.117 

0.288 

7.779 

0.455 

11.262 

REM (b) 

n = 100 

0.114 

0.027 

0.033 

0.376 

0.314 

0.328 

1.199 

1.011 

2.160 


n = 300 

0.040 

0.008 

0.013 

0.216 

0.162 

0.265 

1.197 

0.675 

2.160 


n = 500 

0.024 

0.005 

0.010 

0.181 

0.133 

0.247 

1.197 

0.571 

2.150 

Exp (c) 

n = 100 

0.095 

0.022 

0.030 

0.399 

0.540 

0.554 

1.528 

1.426 

2.520 


n = 300 

0.031 

0.007 

0.015 

0.289 

0.412 

0.508 

1.531 

1.143 

2.498 


n = 500 

0.019 

0.004 

0.013 

0.266 

0.383 

0.494 

1.530 

1.074 

2.492 





rrii ~ 

{15.... 

, 20} and SNR — 

1 





4>i 

<f>2 

6 

6 

IN-IPE 

IN-IPE„aive 

OUT-IPE 

OUT-IPE„™ 

NP (a) 

n = 100 

0.076 

0.002 

0.010 

0.180 

0.101 

0.238 

7.807 

0.477 

10.666 


n = 300 

0.026 

< 0.001 

0.009 

0.120 

0.065 

0.183 

7.796 

0.282 

10.728 


n = 500 

0.016 

< 0.001 

0.009 

0.108 

0.058 

0.173 

7.797 

0.242 

10.772 

REM (b) 

n = 100 

0.097 

0.025 

0.031 

0.272 

0.252 

0.232 

0.897 

0.612 

1.833 


n = 300 

0.034 

0.008 

0.013 

0.156 

0.132 

0.201 

0.896 

0.462 

1.841 


n = 500 

0.020 

0.005 

0.010 

0.135 

0.110 

0.194 

0.897 

0.440 

1.836 

Exp (c) 

n = 100 

0.080 

0.022 

0.030 

0.308 

0.417 

0.467 

1.240 

1.048 

2.147 


n = 300 

0.026 

0.006 

0.015 

0.233 

0.309 

0.444 

1.245 

0.938 

2.155 


n = 500 

0.016 

0.004 

0.012 

0.221 

0.285 

0.438 

1.246 

0.886 

2.129 





rrii ~ 

{8..... 

12} and SNR — 

5 





4>i 

4>2 

6 

6 

IN-IPE 

IN-IPE„,™ 

OUT-IPE 

OUT-IPE„ai^e 

NP (a) 

n = 100 

0.092 

0.005 

0.005 

0.328 

0.213 

0.363 

7.184 

0.958 

10.795 


n = 300 

0.031 

0.001 

0.002 

0.213 

0.124 

0.268 

7.170 

0.506 

10.662 


n = 500 

0.019 

0.001 

0.001 

0.187 

0.103 

0.242 

7.178 

0.402 

10.585 

REM (b) 

n = 100 

0.114 

0.037 

0.037 

0.404 

0.355 

0.293 

0.594 

0.958 

1.478 


n = 300 

0.040 

0.010 

0.011 

0.218 

0.167 

0.235 

0.595 

0.627 

1.476 


n = 500 

0.024 

0.006 

0.007 

0.180 

0.135 

0.219 

0.596 

0.529 

1.467 

Exp (c) 

n = 100 

0.095 

0.033 

0.033 

0.420 

0.573 

0.513 

0.922 

1.419 

1.838 


n = 300 

0.031 

0.010 

0.010 

0.290 

0.412 

0.466 

0.929 

1.109 

1.814 


n = 500 

0.019 

0.006 

0.006 

0.264 

0.378 

0.453 

0.929 

1.033 

1.807 





rrii ~ 

{15.... 

, 20} and SNR — 

5 





4>i 

<l>2 

fi 

?2 

IN-IPE 

IN-IPE„a™ 

OUT-IPE 

OUT-IPE„„™ 

NP (a) 

n = 100 

0.076 

0.003 

0.003 

0.174 

0.095 

0.205 

7.462 

0.441 

10.300 


n = 300 

0.026 

0.001 

0.001 

0.113 

0.057 

0.147 

7.453 

0.239 

10.359 


n = 500 

0.016 

< 0.001 

0.001 

0.101 

0.050 

0.136 

7.454 

0.200 

10.406 

REM (b) 

n = 100 

0.097 

0.035 

0.035 

0.300 

0.293 

0.205 

0.552 

0.568 

1.464 


n = 300 

0.034 

0.010 

0.010 

0.160 

0.140 

0.178 

0.552 

0.426 

1.473 


n = 500 

0.020 

0.006 

0.007 

0.136 

0.114 

0.172 

0.554 

0.405 

1.470 

Exp (c) 

n = 100 

0.080 

0.033 

0.033 

0.330 

0.451 

0.434 

0.895 

1.012 

1.779 


n = 300 

0.027 

0.009 

0.010 

0.236 

0.313 

0.410 

0.902 

0.901 

1.785 


n = 500 

0.016 

0.005 

0.006 

0.221 

0.284 

0.403 

0.902 

0.851 

1.763 


Table 2: Computational time (seconds) 


rrii ~ 

{8,..., 12} and SNR 

= 1 


n = 100 

n — 300 

n = 500 

NP (a) 

7.369 

15.892 

21.418 

REM (b) 

9.282 

11.347 

22.559 

Exp (c) 

7.514 

16.229 

17.109 
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affect brain white matter. Specifically, DTI provides different measnres of water diffusivity 
along brain white matter tracts; its use is instrumental especially for brain tracts where 
the diffusion is sensitive to any alteration of the tissues in the tract, like the ones caused 


by multiple sclerosis (MS), or white matter diseases more generally (see Alexander et ah 


(2007), Basser et ah (1994), Basser et ah (2000), Basser and Pierpaoli (2011)). 

In this paper we focus on studying a commonly used DTI measure - fractional anisotropy 
(FA) - along the corpus callosum tract in MS patients. FA of the water diffusion ranges 
from zero to one, with zero being the perfect isotropic diffusion in all directions. The 
DTI study involves 162 MS patients observed at between one and eight hospital visits; 
the total number of visits for all patients is 421 with mean and median equal to 2.599 
and 2 visits per patient, respectively. At each visit, FA is measured at 93 equispaced 
discrete locations along the corpus callosum tract, and thus can be viewed as a typical 
densely-sampled functional data. The data contains a total of 39 missing values; however, 
no modihcation is needed as the proposed method is not sensitive to mild missingness. 

Our main objective is twofold: (i) to understand the dynamic behavior of the FA 
prohle in MS patients over time and (ii) to make accurate predictions of the FA prohle 
for the observed subject’s future visit. Various aspects of the DTI study have been 


also considered in Greven et al. 

(2010 

), 

Goldsmith et al. 

2011 

), 

Staicu et al.f ( 

2012) 

and 

Pomann et al. 

(2013 

). In particular 

Greven et al.j ( 

201( 

]) used 

an earlier version of 


the study consisting of data from fewer and possibly different patients and a different 
registration technique. They studied the dynamic behavior of FA over time; however, 
their method cannot provide prediction for an already observed patient of the FA prohle 
at their next hospital visit. Our method quantihes the longitudinal changes in FA over 
time using a parsimonious model framework and provides accurate prediction of the full 
response prohle at patient’s future visit. The results have the potential to shed lights on 
the understanding of the MS progression over time as well as its response to treatment. 

To start with, for each subject we dehne the hospital visit time Tij by the diherence 
between the reported visit time and the subject’s baseline visit; thus Tn = 0 for all 
subjects i. Also the resulting values are scaled by the maximum value in the study so 
that Tij G [0,1] for all i and j. The sampling distribution of the visit times is right-skewed 


with rather strong skewness; for example there are only few observations T^’s close to 
1. The strong skewness of the sampling distribution of T^-’s has serious implications 
on the estimation of the bivariate mean /i(s,T); a completely nonparametric bivariate 
smoothing would results in unstable and highly variable estimation. This is probably 
why Greven et al. (2010) hrst centered the times for each patient i, {Tij ; j = 1, 
and then standardized the overall set {Tij : z,j} to have unit variance. 


However, such 

subject-specihc transformation of Tj^’s loses interpretability. In particular, while such 
transformations are appropriate for understanding the complex dynamics of these data 
they may not be suited for prediction at unobserved times. Specihcally, it is not clear 
what type of transformation to apply to a future visit time of a subject, in order to predict 
the response prohle at the respective future time - and this is crucial for our analysis. 
One way to bypass this challenge is to assume a simpler parametric structure along 
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the longitudinal direction, T, for the mean function; based on the exploratory anal¬ 
ysis we assume linearity in T. Specihcally we assume the varying coefficient model 
fi{s,Tij) = /io(s) -l- (3T{,s)Tij, where /io(') and /3 t(-) are unknown, smooth functions of 
s. We model and Pt{-) using a penalized univariate cubic spline regression with 
10 basis functions; the smoothing parameters are estimated using REML. Plots of the 
estimated mean, /i(s,T) = /io(s) + (3t{s)T, and the estimated slope function, are 

given in Figure SI of the Supplementary Material. The estimates seem to indicate that 
the population-level mean of FA, p(s,T), does not change much over longitudinal time, 
T; in other words the estimated slope function, is very close to zero. We use 

bootstrapping of the subjects to quantify the variability of the slope function estimator. 
In particular, pointwise conhdence band is constructed as 2.5% and 97.5% quantiles of 
the bootstrap estimates at each location of the tract, s, and the joint conhdence band is 
constructed as given in Crainiceanu et ah (2011) using B = 1000 bootstrap samples. As 
shown in Figure the joint conhdence band contains zero for all s. 

The observation that the mean of FA prohle varies very little over time is in agreement 
with prior literature; for example see Greven et al. (2010) who used a diherent data set 
with subjects observed also over a relatively short time frame. To gain more insight into 
this direction we consider a formal test to see whether /r(s, T) varies over time; using our 
mean model assumption this is equivalent to testing the null hypothesis that the slope 
function is null, Hq : Pt{s) = 0 against the alternative : /9r(s) % 0 for some s G [0,1]. 
We inspire from the ideas discussed in Park et al. (2015) to carry out the testing procedure. 

Let the test statistic he Q = J PrisYds - the norm of the slope estimator. To ap¬ 
proximate its null distribution, we use bootstrap of the residuals as described next. First, 
estimate the mean function by assuming the model fi{s,T) = /io(s) + /3t{s)T and under 
working independence; let Qots = J (dr^sYds be the observed test statistic using numeri¬ 


cal integration. Second, construct the bootstrap sample as = Fp(s) — /3r(s)Tjj. 

Third, obtain B bootstrap datasets, by bootstrapping with replacement from the above 
set; take B = 1000. For each data set we ht the above mean model and calculate the 
bootstrap test statistic, denoted by Qb, where b indexes for bootstrap dataset. Fourth, 
approximate p-value by > Qobs)/1000, where / is an indicator function. The 

null distribution of Q is shown in Figure S2 of the Supplementary Material. The proce¬ 
dure yields p-value = 0.206, which does not provide signihcant evidence to reject the null 
hypothesis that the mean function is constant over time. In light of this, we assume a 
T-invariant mean model /i(s,T) = /io(s) for the rest of the data analysis; the estimated 
Po(s) is shown in Figure 

We focus next on studying the variability in the data, and on using this variability 
to inform future prediction of a subject’s full trajectory. After demeaning the data, we 
estimate the marginal covariance and obtain the eigenbasis functions; using the preset level 
PVE = 0.95 results in iP = 10 eigenfunctions. Figurej^shows the leading 3 eigenfunctions 
that explain in turn 62.69%, 8.37% and 6.77% of the total variance, respectively; the rest 
of the estimated eigenfunctions are given in Figure S4 of the Supplementary Material. As 
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Figure 1: Left panel: 95% pointwise and joint confidence bands of the slope function 
I5t{s) of /i(5,r) using bootstrap; Right; final mean estimate, jji{s^T) = //o(s) 




Figure 2; Top; First three eigenfunctions of the estimated marginal covariance; Bottom: 
estimated mean function ^q{s) (gray line) ± 2y \k<t>k{s) (+ and — signs, respectively) 


k = 1 


k* 


2 


k»3 



tract 


tract 


tract 


shown in Figure a positively loaded first eigenfunction corresponds to a mean profile 
that is lower than the overall mean for all s, and the opposite for a negatively loaded one. 
We initially use a nonparametric approach to estimate the longitudinal covariance of 
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^ifc(T)’s for fc = 1,..., 10. Preliminary results (not shown here) indicate a simpler covari¬ 
ance model, which is the model we will present. More specifically, assume a random effects 
model iikiTij) = hoik + bukTij, where vaT{biik) = aff. for / = 0,1 and cov{boik, buk) = CToik- 
The fitted time-varying coefficient functions, ^ik{T), for k = 1,2 and 3 are shown in Figure 
1^ and the rest are shown in Figure S5 of the Supplementary Material. The estimated time- 
varying coefficient functions corresponding to suggest some longitudinal changes, 

but their signs remain the same across visit times except few cases with a relatively small 
magnitude. Overall, it implies that an individual mean profile tends to stay lower than 
the overall mean across all visit times if the first eigenfunction corresponding to that indi¬ 
vidual is positively loaded at baseline, and vise versa. In contrast to the estimated 

basis coefficient functions corresponding to the second eigenfunction, are mostly 

constant across visit times and imply little changes over time. 

One advantage of using a simpler, parametric model is that one can consider more 
formal testing about longitudinal effects. We consider this idea next, and study sepa¬ 
rately the null hypotheses, Hq/c : = cxoifc = 0 for A; = 1,..., 10 using the proxy data 

iw,ijk = — Afo(s)}0fc(s) as the ‘observed’ data. Bonferroni correction is used to 

appropriately account for multiple testings; specifically, we use the adjusted significance 
level, ttadj = 0.05/10 = 0.005, for testing each hypothesis such that familywise error rate 
is a = 0.05. Likelihood ratio test (LRT) and its null distribution as approximated by 
the mixture chi-squares are used for each hypothesis, Hofc- With the adjusted significance 
level, aadji we reject the null hypotheses, Hofc, for k = 1,3,4, and 5, with p-values less 
than 0.001. We conclude that there is nontrivial longitudinal dynamics of FA. To the best 
of our knowledge, this is the first attempt to carry out a formal testing for longitudinal 
changes in functional observations over time. 



Finally, we assess the goodness-of-fit and prediction accuracy of our final model. 
For the goodness-of-fit we use the in-sample integrated prediction error (IPE); IPE= 
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Ei=i I{yijis)-^ijis)Vds/{J2i=imi}, where ^^(s) = ^o(s)+Efc=i(^oifc+&iifcTij)0fc(s), 
and lij(s)’s are the observed curve data. We compare our results with two other compet¬ 
itive approaches: Greven et ah (2010) and Chen and Muller (2012). The square root of 


the in-sample IPEs are 2.31 x 10“^ for our model, 2.66 x 10“^ for Greven et ah (2010), 
and 3.76 x 10“^ for Chen and Muller (2012). For all of three approaches we use the same 
mean function estimate, fi{s,T) = fto^s), and the same pre-specified level PVE = 0.95; 
for Chen and Muller (2012) we use the pre-set bandwidth h = 0.1. 

To assess the prediction accuracy we use leave-the last-curve-out integrated prediction 
error; J2l=i I{^imiis) — Ej|^*"**'(s)}^(is/106, where is the predicted curve for 

the ith subject using the fitted model based on all the data less the rriith curve of the 
ith subject. Specifically, for each of 106 patients with more than one hospital visits, we 
remove the FA measurements taken at the last hospital visit from the dataset; the obtained 
dataset is used to fit the proposed model and predict the removed curve measurements. 
Figure 1^ shows such predicted curves obtained using our model and the naive 

model for three randomly selected subjects’ FAs at their last visits. The alternative 
methods are Chen and Muller (2012) and the naive approach described in Section]^ The 
square root of the leave-the last-curve-out integrated prediction errors are 3.48 x 10“^ 
for our model, 8.71 x 10“^ for Chen and Muller (2012), and 3.52 x 10“^ for the naive 
approach. Surprisingly, the naive estimation has relatively good prediction performance, 
and better than Chen and Muller (2012). Nevertheless these results confirm that, in this 
short term study of MS, there is a small variation of FAs within subjects over time. 


Figure 4: Predicted values of FA for the last visits of three randomly selected subjects; 
actual observations (gray); predictions using our model (black solid) and using the naive 
approach (black dashed) 

subject A, visit 6 subject B, visit 2 subject C, visit 3 
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7 Discussion 


In this paper we propose a novel parsimonious modeling framework for repeatedly observed 
functional data, due to a longitudinal design. Accounting for the dependence within 
the subject as well as for the longitudinal design is crucial for making full prediction 
at future visits, or assessing the dynamics of the underlying process over time. However, 
current methods either ignore the dependence or are too complicated and computationally 
intensive. 


Using Shiny (Chang et ah, 2015) we implemented interactive plots to help visualize 


longitudinal functional data as well as the various components and prediction obtained 
using the proposed method; these tools will shortly be accessible on GitHub. R code for 
illustrating the procedure is publicly available at http: //www4. stat. ncsu. edu/~staicu/ 
software/MLFD_Rcode.zip 


Appendix 

Proof for Theorem 

We prove this theorem by hrst proving the pointwise consistency and then proving the 
Hilbert-Schmidt norm consistency of the covariance estimator. 

Pointwise consistency. 

Recall S(s, s') = EILi EJi ^b(') = Fix tem¬ 

porarily s, s' G S and dehne Ai{T) = Yi{s,T)Yi{s',T)] then A is iid as the random 
variable A dehned by A{T) = Y (s, T)Y (s', T). Let {T^ : d = 1,..., D} be a set of unique 
Tjj’s for all i and j in increasing order such that Ti < T 2 < ... < Td with Tq = 0 and 
Td = 1. D denotes the total number of unique Tj^’s. Due to the sparse assumption of 
the longitudinal design (m* < 00 for alH = 1,..., n) we obtain that D diverges with the 
sample size n. It follows that 



d=l 


the hrst equality is obtained by using a different way of counting the summands of S(s, s'), 
and the second one by multiplying and diving by D. Let Mi = ^^^=1 '^1 
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and Pd = P(T G {Td-i, Td\), where by an abuse of notation T denotes the random variable 
with sampling distribution g{T). Here we dehned Vd = {D/Mi) ^i{Td)I{Tij G 

{Td-i,Td\)', Vd depends on s,s', though the dependence on s and s' is suppressed. This 
S(s,s') = V = D~^J2d=i^d- Vd’s are correlated over d; thus to show that ‘S(s,s') is 
consistent’ it is sufficient to show that the average of dependent variables V^’s is consistent. 

Take the latter problem and study hrst the covariance of Vd and Vd'. We have: 


coY{Vd,Vd') = E[VdVd'] - E[H,]E[H,,] 

- { } {Ej ■ c((s, n), (s', n)) ford = <i' 

= < 

r M f)^ 'i r 'i 

- {^’^}{Pdft-c((s,rj),(s',Tj)).c((s,T,.),(s',rj,))} forded'. 


For simplicity in notation, denote the variance of Vd with cr^, and the covariance of Vd 
and Vd' with add'- Under the assumptions (Al.) (A3.), it holds that E(Vrf), aj, and add' are 
hnite. Following Theorem 5.3 (Boos and Stefanski, 2013, p.208), we show the consistency 
of V by showing that the following converges to 0 in probability as D diverges: 


L 


d=l d'j^d 


d=l 


^Et|E[/1(T)"|T]] 


+ 


iwi ~ ig^}ET[ET.lE|di(r).4(r)|T,r|]] 


f M2 _ 

I M 2 M: 


D 

r-}j2PhnA{T)^\T = T,i 

^ d=l 


^2 ,..2 
^2{^{S,S)} 


As integration of a continuous function in a compact interval is hnite, under the assump¬ 
tions andpp] Et[E[H(T)2|T]], Et[Et'[E[H(T)H(T')|T,T']]], and S(s,s') are £- 


nite. For example Et[E[H(T) 2|T]] = f.^^(T)E[H(T)2|T] dT < s up^^(T) E[H(T)2|T]dT 
is finite because (i) snprpg(T) is finite by the assumption (A2.) and (ii) J.j-E[A{T)‘^\T]dT 
is fi nite b ecause E[A(T)2|T] is continuous and finite in T by the assumptions (A2.) 


(A3. 


, With the same argument we can show that Yl!d=iPd ' E[A(T)2|T = Td\ < 
E[A(T)2|T]] is also hnite. Here we use the results that M 1 /M 2 = 0(1) and 


and 

SUPrfPrfET 

each term diverges to 00 as P diverges (since D < Mi). It implies that 


1 

P2 




D 


d=l 


d=l d'^d 


CPdd' 


0 as P —?• cx). 
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Using Theorem 5.3 (Boos and Stefanski 
converges in probability to D~^ 


2013), we obtain that S(s, s') = U = D ^ 'l2d=i ^ 
{^d] = Yld=iPd ■ c((s,Td), (s',Td)) as n diverges; 


where the latter expression is equal to S(s, s'). 

Hilbert-Schmidt norm consistency 

Let Rijdv =< YijHv > YijliTij e {Td-i,Td]) =< Yi{-,Td),e^ > Yi{-,Td)I{Tij e 
{Td-i,Td\), where {e^ : n > 1} is any orthonornial basis. Dehne the sample covariance 
operator associated with S(s,s') as follows: 

D n m.i 

h(x} = d-'J2wYY< ^ 6 (U-i.n]) 

d=l ^ ^ i=l j=l 
D n rrii 

= < y^{-^Td),x> Y,{;Td)m^ e {Td.,,Td]), x e l\ 

d=l ^ ^ i=l j=l 

Under the assumptions (Al.), (A2.)| and (A4.), we have that the following sequence of 
ine/equalities are true: 


E 


|S(s,s')-E[S(s,s')]l 


= E 


H-E[H]\\"] =E\Y,\\H{e,)-E[H{e 


V=1 


= E 


D n m. 

D 


E |I^’‘ E E E {< '=« > 6 (n_i,ni) 


_ v=l d=l i=l j=l 


E[< Yij,ey > YijI{Tij G {Td-i,Td])] 


(using Equation (2.2) from [Horvath and Kokoszka (2012, p.22)) 


D n rrii 


e E A'EAEEh^ 


Ml 

v=l d=l 2=1 j=l 


Hjdv 


D D 


D 


n rn. 


EI D ^2 ^ {R^dv RiRijdv] }, {r^j'^v YiRrj'd'v] } > 


v=l d=l d'=l 


* J 


D D 


2^2^ Mf 

=1 d'=i 1 


d=i d'=i 
D D 


n n rrii oo 


<vr=EE 


d=l d'=l 


^ ^ ^ ^ ^ ^ ^ ^ ^ ^ijdv '^[Rijdv]'! Ri'j'd'v ^ ^ 

i i' j j' v=l 

n rrii oo 

^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <[ Rijdv Ri'j^d'v ]> ^ 


2 i' j j' 2;=1 


(17) 
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D D n n rrii oo 

d=l d'=l i i' j j' v=l 


■^E[<C Rijdy^Riijifiiy >] < E[/?j/j/(;;/^,] > 


< 


(M 2 - Ml) ■ sup^Pd 1 2(M2 - Ml) 1 

Mf +Mr+ MJ 


E 


\\ y{:Td 


0, 


as n diverges. It implies that E 


|S(s,y)-E[S(s,s')]| 


= E 


||S(s, s') — S(s, s' 


converges to 0 as n diverges, and the Hilbert-Schmidt norm consistency, ||S(s,s') 
S(s, s')|| 0, is implied by Markov inequality. 


Proof for Theorem 

First we show that for each k 


supj\^W,ijk — ^W,ijk\ 0, 


as n diverges. 

We have that the following sequence of inequalities hold; 
supj\^w,ijk - ^w,ijk\ = snpj j Yi{s,Tij)^k{s)ds - j Yi{s,Tij)(j)k{s)ds 
= snpj j Yi{s,Tij){'$k{s) - (j)kis)}ds 

< sup^- j Yi{s,Tij) {0fc(s) - 0fc(s)} ds 

< SUPjSUp,g[o,i] 




< SUPi 
= SUPj-^ 


Yi{s,Tij) J {(j)k{s) - (j)k{s) }ds 

1/2 


Yi{s,Tij) X ly {0fc(s) -(j)k{s)yds 

Yi{s,Tij) X || 0 fc(-) - 0 fc(-)IU 


(18) 


(by Cauchy-Schwartz ineq. 
(19) 


where sup 




Yi{s,Tij) 


is absolutely bounded almost surely under the assumption 


(A6.: 


and ||0A:(') — 0fc(')lls converges to 0 as shown in Corollary that is proved in the Supple¬ 
mentary Material. This concludes the first part of the proof. 

Recall tha t GkiT ,T') = cov{^ii{T)Rii.{T')} is the true covariance. It is already shown in 
Yao et ah (2005) the uniform consistency of its local linear estimator, Gw,k{T,T'), when 


Gw,k{T,T') is obtained with ^w,ijk^] specihcally, Gw,k{T,T') is obtained by the local lin¬ 
ear smoothing of {Gw,ikiTij,Tij/) = ^w,ijk^w,ij'k ■ j 7 ^ j'}- Here we show a similar result 
when the local linear estimator is obtained with ^w,ijk instead of ^w,ijk- 
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The sample covariance of ^w,ijk^ is as follows: 


GikiTij, Tiji) 


iw,ijkiw,ij'k 

{{iw,ijk — iw,ijk) + iw,ijk)} X {{iw,ij'k “ iw,ij'k) + iw,ij'k] 
Gw,ikiTij,Tij/) + {iw,ij'k — iw,ij'k){iw,ijk — ^W,ijk) 

+ ^W,ij'k{iw,ijk — iw,ijk) + iw,ijk{iw,ij'k — iw,ij'k) 


By the uniform consistency of ^w,ijk given in Equation (18), the local linear estimator, 
Gk{T, T'), obtained by smoothing {(T^-, Tij^), GikiTij, T^y) : i = 1,..., n, j 7^ f} is asymp¬ 
totically equivalent to the local linear estimator, Gw,k{T,T'), obtained by smoothing 
Gw.iki TiiyTii^ys. Furthe rmore as Gw,kiT,T') is shown to be a uniformly consistent esti¬ 
mator ( Yao et ahj 2005), the uniform consistency of Gfc(T,T') follows. Furthermore the 
Hilbert-Schmidt norm consistency is implied by the uniform consistency. 


Proof for Conjecture 

In the following we prove the consistency of the predicted trajectory, Yi(s, T), for the case 
when ^w,ijkS are uncorrelated over k. This is basis of our intuition for Conjecture]^ 

To show the consistency of the predicted trajectory, Yiis,T) = J2k=i^ikiT)((>kis) = 
Yl!k=iYld=iCiki'<PkiiT)(i)kis), we hrst need to show that the truncated true trajectories, 
Yi^^is,T) = J2k=i'l2f=iCiki4’kiiT)(j)kis), are consistent estimators of the corresponding 
(not-truncated) true trajectories, Yiis,T) = Yl'^=iYll^iCiki'>PkiiT)(i)kis)- 

Recall that {4>k{s)}k is set as the eigenbasis of the marginal covariance S(s,s'). It 
follows that for as R' —)■ cx) we have that ^k4>kis)4>kis') converges to 0 uniformly 

over s and s' by Mercer’s theorem. And it implies that Y1T=k+i ^k4>kis)‘^ also converges 
to 0 uniformly over s, and furthermore that 


sup^ ^ = sup^varj^ ^ ^w,ijk4>kii 

k=K+l k=K+l 


= sup^varj^ ^ {GikiTij) + eijk}(j)kis) 
k=K+l 

00 00 00 

= sup^var Y E 

k=K+l 1=1 

00 00 

e e 0fc('S) 


k=K+l 


= sup^var 


k=K+l 1=1 
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oo 


oo 


E 


var 




WAk 


Y1 ^e%0fc(5) 


k=K+l 1=1 

(By total law of variance) 


k=K+l 


= sup^E 


OO OO 

K E E CifczV^fcz(Eij)0fc('S) j’ 

k=K+l l=l 

oo oo 

E E Cikl'4^kl{Tij^ j’0fc(s) 1^1 


2i 


E 


k=K+l 1=1 


+ Y1 ^e,fc0fc(s) ^ 0, 

( 20 ) 


k=K+l 


as K diverges. Because all terms in Equation (20) are non-negative and the summation of 
those terms converges to 0, it is implied that each term also converges to 0 as if diverges. 

And by Markov inequality, E<^ Y1T=k+i Ciki'i/Jki{Tij)(pkis)\$,w,ik \ converges to 0 as if 


k=K+l 1=1 


diverges. Notice that 

OO OO OO OO 

e{ E E E Ee 

" " k=K+l 1=1 

OO OO 

^ ^ Cikl'4^kl(Tij)4^kA^ 

k=K+l 1=1 

K 

= %{s,Tij) - '^Zk{Tij)4>k{s) —t 0 as if —)■ oo, 


k=l 


where ^ik(T) = YA^iCikii^ki is the full target trajectory that we want to predict in the 
second FPCA step. ^ ^ 

Let iU T) = E&i Cikii’ki be the truncated true trajectory of iik{T). By Lemma 3 of 


Yao et al. 


(2005), supt’ E[^4(E) —iik{T)Y —t 0, and by Markov inequality it implies that 
for each k ^4(L") converges to ^ik{T) in probability as Lx diverges. Thus by combining 
two results, (1) J2k=i^ik(Tij)4^kA) converges to Yi{s,Tij) in probability as if diverges 
and (2) for each k ^4 converges to ^ik{T) in probability as diverges, we show that 
the truncated target trajectories, Y/^^{s,T)'s, converge to the corresponding full target 
trajectories, Yi{s,Tys. 

Finally we show that the predicted trajectory, Yj(s, T), is a consistent estimator of the 
full target trajectory, Y4s,T). Note that |Y4s,T) - Y.^^(s,T)| < |Yi(s, T) - Y.^^(s, T)| + 
|YA'^(s,T) — Yi{s,T)\. We already show that the second term converges to 0 as if and 
Lfc’s diverge. And by Slutsky’s theorem and the consistency of the estimators of all model 
components, the hrst term, \Yi{s,T) — YL^{s,T)\, converges to 0 in probability as n 


28 












diverges. Thus for each (s,T) G iS x T, 

K Lk oo oo 

~ ^ ^ ^ ^ Cikl'^kl{T)(pk{s) ^EE Ciki'ipki{T)4>k{s), (21) 

k=l 1=1 k=l 1=1 

as n, K and L^’s diverge. 
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